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random  fields  are  also  dicussed.  Finally,  we  present  an  experimental 
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approximation  scheme  (simulated  annealing). 
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1.  Introduction. 


Recently, there  has  been  a  lot  of  interest  about  the  use  of  Markov  random 
field  (MRF)  models  and  Bayesian  estimation  techniques  for  signal  processing 
tasks,  such  as  image  restoration  [G1.G2];  surface  reconstruction  [Ml],  and  image 
segmentation  [El.Ml],  The  use  of  these  techniques  is  very  attractive,  since  it  provides 
a  unified  framework  for  the  formulation  of  a  variety  of  problems,  and  it  permits 
the  incorporation  of  a  priori  knowledge  about  the  nature  of  the  solutions  that  one 
expects  to  find  in  a  simple  and  elegant  way. 

One  serious  problem  associated  with  their  use,  however,  is  that  it  involves  the 
global  minimization  of  a  non-convex  "energy"  function  of  many  variables,  and 
although  stochastic  approximation  methods,  such  as  "Simulated  Annealing"  [Kl], 
have  been  found  effective  for  finding  a  solution,  their  computational  efficiency 
leaves  much  to  be  desired. 

This  situation  provides  the  motivation  for  trying  to  exploit  the  structure  of  each 
particular  problem  to  find  more  efficient  (possibly  deterministic)  methods  to  find  the 
optimal  estimates.  In  this  paper  we  study  the  simplest  problem  of  this  class:  finding 
the  maximum  a  posteriori  (MAP)  estimate  of  a  one  dimensional  binary  Markov 
random  field,  and  show  that  it  is  indeed  possible  to  find  efficient  deterministic 
algorithms  for  its  solution. 

1.1.  Formulation  of  the  Problem. 


Consider  a  one  dimensional  lattice  with  N  nodes,  and  suppose  that  at  each  node 
j  there  is  a  cell  whose  state  can  be  modelled  as  a  random  variable  Fj,  which  can 
take  only  two  values:  Fj  e  {ko,ki}.  Suppose  also  that  the  conditional  probabilities 
for  the  collection  F  satisfy: 

Pr(Fj  =  fj  |  Fi  =  i^j)  =  PT{Fj  =  fj\Fi  =  fi,  t€[l,N];  |t-j|  =  l) 

that  is,  F  is  a  first  order  MRF.  In  this  case,  it  can  be  shown  [P1.K2]  that  the  joint 
probability  density  of  the  configuration  F  is  given  by  the  Gibbs  distribution: 

P(F  =  /)=  i  «p[-i  e"  W- (.  /.«)]  (1) 

*  Q  »-l 


where  Z  is  a  normalizing  constant,  a  is  a  parameter  and  the  functions  V  are  the 
potentials  of  the  system.  In  particular,  we  will  consider  the  potentials: 


V(/.,/i+1)  =  {,*• 


ifA-A+i 
if  fi  fi+i 


In  this  case,  F  corresponds  to  the  one  dimensional  Ising  model  of  ferromagnetic 
phenomena  for  a  finite  lattice  with  free  boundaries,  and  a  can  be  interpreted  as  the 
natural  temperature  of  the  system. 


Suppose  now  that  we  have  some  noisy  observations  of  a  particular  realization 
/  of  the  field  F.  Our  problem  is  to  find  the  "best"  estimate  for  /  given  these 
observations  and  our  prior  knowledge  about  the  properties  of  /. 

We  will  use  the  following  model  for  the  observation  process  g : 

9i  =  H{fi,  nt) 

where  n,  is  a  white  noise  process  (so  that  n,  is  independent  of  ny  for  all  t  j) 
independent  of  and  is  a  deterministic  function  invertible  with  respect  to  n,-, 
so  that  we  can  write  the  conditional  distribution  of  g  in  the  form: 

P{9  I  /)  =  jr  «xp[-  E 

*=i 

where  Zg  is  a  constant  independent  of  /,  and  <!>*„,  are  deterministic  functions. 

Two  familiar  instances  of  this  model  are:  the  binary  symmetric  channel  with 
error  rate  e,  in  which  case 

♦*(«)  =  +  (i- ~^)(i  -  «)|, 

and  the  case  of  additive  or  multiplicative  white  noise  (not  necessarily  Gaussian). 
(For  additive  white  Gaussian  noise, 


*k{9i)  =  j j}(»  “  *)2) 

Using  Bayes  rule,  we  find  that  the  posterior  distribution  is: 

1  9 }  =  P(9)  -  z  ■  exp^-^  5  M  ~  E 

which  is  also  a  Gibbs  measure.  Since  P{g)  and  Z,  are  constants  for  a  given  set 
of  observations,  the  Bayesian  (MAP)  estimate  for  /  is  found  by  minimizing  the 
"energy  function": 


U{f)  =  ^  /t+i)  +  a  E  */,(»)  (2) 

»=1  i 

In  the  particular  case  of  additive  white  Gaussian  noise,  the  equivalent  problem  (for 
/,  €  {-1, 1})  is  to  minimize: 

m  -  e  (/.-  -  /.»)* + 1  E(/.  - ».)'  (3) 

«  *  » 
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or  equivalently,  minimize 


*  *  t 

where  7  =  £>  corresponds  to  the  signal  to  noise  ratio.  In  physical  terms,  this 
minimization  can  be  interpreted  as  the  problem  of  finding  the  ground  state  (1]  of 
an  Ising  ferromagnet  subject  to  a  spatially  varying  external  magnetic  field  (whose 
magnitude  is  proportional  to  g ),  a  system  which  is  of  current  interest  in  physics. 

What  makes  this  problem  particularly  hard  is  that  the  value  of  each  fj  is 
constrained  to  be  in  the  non-convex  set  {ko,ki}.  If  we  relax  this  condition,  in  the 
case  of  additive  Gaussian  noise,  (3)  becomes  a  convex  function  (since  it  is  a  positive 
definite  quadratic  form),  and  its  (unique)  minimum  can  be  found  efficiently,  for 
example,  by  a  gradient  descent  method.  Alternatively,  we  may  construct  a  linear 
dynamic  system  with  the  same  (exponential)  covariance  function  as  the  process  /, 
and  use  a  Kalman  filter  to  find  the  MMSE  estimate.  However,  it  is  not  clear  how 
to  use  these  relaxed  solutions  to  find  the  correct  (binary)  optimal  estimate.  Instead, 
we  will  now  present  two  algorithms  for  minimizing  (3)  directly,  which  have  the 
additional  advantage  of  being  able  to  handle  other  (non  Gaussian)  noise  models. 
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2.  Dynamic  Programming  Formulation. 


In  this  section  we  present  an  algorithm  for  finding  the  global  minimum  of 
(2),  which,  based  on  dynamic  programming  principles,  reduces  the  problem  to  a 
sequence  of  one  dimensional  optimizations. 

As  we  will  see,  this  algorithm  generates,  as  a  byproduct,  a  family  of  solutions 
which  can  be  considered  as  descriptions  of  the  field  /  at  different  scales,  so  that  the 
coarse  descriptions,  which  are  computed  very  fast,  are  progressively  refined  until 
the  optimal  (finest  scale)  configuration  is  found. 

This  approach  is  based  on  the  following  idea: 

A  configuration  /  is  completely  characterized  by  the  value  of  /lt  and  the  set 
Ln  defined  by: 

Ln  =  {L  :  fL^h+ 1}  ;  M  =  n. 

We  will  call  the  n  elements  of  Ln  the  "boundaries"  of  the  configuration  /.  Since 
these  boundaries  correspond  to  odd  bonds  between  neighboring  cells,  we  can  define 
an  equivalent  energy  function  as: 


with  U(J)  -  £  ♦*{*),  /<  6  {*»,*, }  (4) 

« 

For  a  fixed  n,  U  depends  only  on  the  value  of  /1,  and  on  the  position  of  the  n 
boundaries,  that  is,  on  n  +  l  variables.  To  make  this  dependence  more  explicit,  let 
us  define  the  functions 


G(L)=  £(♦*(*)-♦*»(*/))  (5) 

3= 1 

Let  U0  and  Ux  denote  the  energy  functions  corresponding  to  the  configurations  with 
fi  =  k\  and  k0 ,  respectively,  for  a  given  set  of  boundaries 

Ln  =  {Li, . .  .Ln},  L\  <  ...  <  Ln 


We  have  that,  for  n  even, 

Uo(n,  Ln)  =  n  4-  *k0{9j)  +  E  **.(?;)  +  •  •  •  +  E  *k.(w)l  = 

*  j  =  l  L>+ 1  I^.+l 

—  n  +  « [G{L\)  -  G[Li)  +  . . .  -  G(Ln)  +  E  **«(0;)1 

z  i=i 


(6) 


> 


Ui{n,Ln)  =  n  +  **,(jy)  +  £  $fco(?/)  +  ■  •  •  +  E  ^i(9j)]  = 

4  y-i  i»+i 


N 

E 

L,+l 


n  +  -[-G(X,)  + . . .  +  G(I*)  -  GfAT)  +  £  •*(*)] 


(7) 


y=i 


C 


and  for  n  odd. 


a 


» 


» 


» 


Wb(nf  Ln)  —  n  +  —\G(Li)  —  G[L2 )  +  .  •  •  +  G(Ln)  —  G(N)  +  ^2  $*0(ffy)] 

4  y=i 

Ui(n,  Ln)  =  n  +  —  [-G(Li)  +  . . .  -  G(Ln)  +  £  $*•(?/)]  (8) 

1  y= i 

(Note  that  Ey  does  not  depend  on  /). 

Let  si.°\  Sn1  be  the  sets  of  boundaries  that  minimize  UQ  and  Uu  respectively. 
Then,  the  optimal  energy  for  a  given  n  is: 

u'n  =  min[U0(n,  S®),  U(n,  S™)]  (9) 

We  will  define  S„  to  be  the  corresponding  optimal  set  of  boundaries. 

The  determination  of  sW  is  an  n-dimensional  optimization  problem.  However, 
as  we  will  show  below,  it  is  possible  to  decompose  it  into  a  sequence  of  one 
dimensional  optimizations  using  a  dynamic  programming  formulation.  With  this 

approach  we  also  get,  as  a  bonus,  the  solutions  Sy . S^,  Jfc  e  {0,1},  and 

the  corresponding  optimal  energies.  If  we  set  n  =  N,  the  solution  to  the  original 
problem  (3),  £/*(n*,5n*)  can  then  be  found  by  a  one  dimensional  search.  This 
strategy,  however,  can  be  dramatically  improved  by  the  use  of  the  following  facts: 

(i)  We  can  reduce  substantially  the  search  space  for  the  location  of  the  optimal 
boundaries  Lj  €  Sn*. 

(ii)  The  sequences  {C/J,  U*3, . . .}  and  {U\,  U\, . . .}  are  unimodal.  This,  together 
with  the  fact  that  the  dynamic  programming  algorithm  uses  Sy_i  to  compute 
Sj  provides  us  with  an  efficient  stopping  criterion  for  the  computation  of 
the  sequence  {Si, . . Sn*}. 

(iii)  The  expected  value  of  n*  is  usually  small. 

We  will  now  describe  the  algorithm,  and  analyze  each  one  of  these  facts. 

2.1.  Search  Space  for  the  Optimal  Boundaries. 


Let 
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=  {;:  G[j  -  1)  <  G{j)  >  G[j  +  1),  with  G{j  -  1)  ^  G{j  +  1)}  (10) 

Pm  =  {mi,m2l.  ••}  = 

=  O':  G{j  -  1)  >  G[j)  <  G(j  +  1),  with  G(j  -  1)  ^  G[j  +  1)}  (11) 

(Conventionally  we  include  j  =  1  in  Pm,  if  0  <  G(l)  >  G( 2),  and  include  it  in  ?m 
if  o  >  G(l)  <  G{ 2)).  We  define  the  set  P  as 

P  =PM[j  Pm  =  {Pi . Pr} 


(Note  that  P  corresponds  to  the  set  of  places  where  the  sequence  {$*„(?>)- **.($; )) 
changes  sign). 

In  what  follows,  we  will  call  the  elements  of  Pm,  Pm  and  P,  the  "maxima’ , 
"minima",  and  "critical  points"  of  G,  respectively. 

Let  Sn‘+  (Sn-_)  denote  the  subsets  of  Sn-  formed  by  those  boundaries  Ly 
whose  corresponding  term  G(Ly)  has  positive  (negative)  coefficient  in  £/*.,  i.e.,  if 

Sn-  =  SlJ  =  {Lu...Ln‘}, 


then. 


Sn*  +  =  ^3  +fc»  •  •  •} 

5n*_  =  Sn •  -  Sn*+ 


With  these  definitions,  we  have: 


(12) 


Theorem  1:  Sn*+  C  Pm  and  5n-_  C  Pm- 

To  see  why  this  is  true,  let  fML  denote  the  maximum  likelihood  estimate  for 
/  [2]  obtained  by: 

fML  =  ffcl.  if  >  *k0{9j) 

\o,  otherwise 

and  let  /*  be  the  optimal  estimate.  Suppose  that  for  some  j  we  have,  say,  Ly  € 
Sn-+  -  Pm.  Suppose  Ly  €  ( Pk ,  Pk+ 1).  for  some  Pk,Pk+ 1  €  P.  Clearly,  either  Pk  €  Pm 
or  Ffc+i  €  Pm-  Suppose,  for  definiteness  that  Pk  €  Pm - 

If  pk  £  Sn*.  the  configuration  {Li,...Ly_ilPit,Ly+il...Ln*}  has  lower  energy 
than  5n-  (we  decrease  U  without  altering  n),  which  is  a  contradiction.  If  Pk  e  Sn-, 
then  either 

rm,  *  /mm,Li)) 

or /•((£;, ft+i))#/Mt((iy,n+i)) 


6 


and  so,  we  get  a  lower  energy  configuration  by  deleting  Lj  and  either  Pk  or  Pk+X  (we 
decrease  simultaneously  n  and  U).  A  similar  argument  can  be  used  if  Lj  e  (l,Pi) 
or  Lj  e{Pr,N].t 


This  result  means  that  we  can  use  P  to  constrain  the  search  space  for  the 
boundaries  of  each  subproblem  (i.e.,  for  each  fixed  n),  which  now  becomes: 

For  n  <  \P\  fixed,  find Sn  —  {Li,.. . Ln }  with 

Sn+ C  Pm  and  Sn- C  PM  (13) 

such  that  U(n,  5„)  <  U(n,  Ln )  for  all  L„  C  P. 

Note  that  theorem  1  guarantees  that  the  constrained  and  unconstrained  solutions 
will  coincide  only  for  n  =  n\  so  that  for  n^n\Sn  may,  in  general,  be  suboptimal. 

2.2.  Dynamic  Programming  (DP)  Algorithm. 

F  om  equations  (7)  and  (8),  it  is  clear  that,  for  any  fixed  n,  the  determination 
of  the  optimal  (constrained)  configurations  sL° 1 ,  Sn  ’  is  equivalent  to  the  solution  of 
the  optimization  problems: 

For  Sl0): 

Minimize  [G(LX)  -  Gfo)  +  ...] 
with  Li,L3>...  €  Pm,  and  L2,Lit...e  PM. 

For  Si0: 

Maximize  [G(Li)-G(L2)  +  ...] 
with  Li,L3,...e  Pm .  and  Li,LA,...e  Pm. 

Let  us  consider  the  maximization  problems.  Assume,  for  definiteness  that  the 
first  critical  point  of  G  is  a  maximum,  i.e.,  M\  <  mi,  and  define  the  sequences: 

D\{k)  =  sup  G(A/,) 

»>* 


!!(*)  =  (min  L  :  G{Mi)  —  Dx(k)},  k  =  l...\PM\  (14) 

Clearly,  M/,,( q  is  the  optimal  location  of  the  boundary  for  n  =  l  (i.e., 
Sj1)  —  {M/„(i)}),  and  from  Z?i(l)  we  can  easily  compute  the  corresponding  energy. 
We  now  define,  for  j  >  l: 

D2j{k)  =  sup{D2;_((t  +  1)  -  G(m,)} 
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and 


D2]+\{k)  =  sup{D2 ;(t)  -  G(m,')} 

%>k 


Lij{k)  =  {min  L  :  D2j{k)  —  D2j-\[L  +  1)  —  G[Mi)} 

^2j+i[k)  =  {minL  :  Z?2y+j(^)  =  Aj j{L)  +  G[Mi)}  (15) 

One  can  check  that,  for  n  odd, 

&n]  =  •  •  •»  1))...)}  (16) 

and  the  optimal  energy  is: 

U\[n)  =  n  +  ^  [-Un(l)  +  X)  ^Jfco(ffy)]  (17) 

1  i 

For  n  even,  we  define: 

D\{k)  =  sup{-G(m,)} 

»>* 

Lj(fc)  —  {min  L  :  D^fc)  =  — G(mi)} 

£>2y(fc)  =  sup{I>2y-i(*)  + 

%>k 

L'2j {k)  =  {min  L  :  I>2y(A:)  =  D2j-i{L)  +  G'(Mi)} 

D2j+i  sup{Z>2y(*  +  1)  -  ^(m,)} 

\>k 

L2j+\[k)  =  (min  ^  :  D2j+  l(fc)  =  ^2>  ~  G(mi.))  (18) 

and  get: 

S(nl)  =  {ML"(1), . .  •mL;{Ln(...L;(1))...)}  (19) 

U\(n)  =  n  +  f  H>;(1)  -  G(N)  +  £  <1>*0(<7,)]  (20) 

1  i 

For  the  minimization  problems,  that  is,  for  the  computation  of  sL0),  assuming  again 
that  M\  <  mi,  we  have,  for  n  even: 

d\  {k)  =  inf  {-G(m,)} 

%>k 

li(k)  =  {m\nl  :  d\(k)  =  —  G(m/)} 


and  for  j  >  1, 


d2  j[k)  =  mf  {d2;_i(t)  +  G(Mi)} 
hj{k)  =  {min  l  :  d2y(k)  =  d2/_i(0  +  G(Mi)} 


(21) 


<*2,'+i(k)  =  mf- {d2j{i  +  1)  -  G(mt)} 

hj+i[k)  =  {min  l  :  d2j+i{k)  =  d2](l  +  1)  -  G(mi)} 

The  solutions  are: 

5n)  =  -  •  -*rrl/,(/2(...(/f.(i)).-)} 

%(«)-»  +  £Wl)  +  £  **(*)]  (22) 

For  n  odd: 

d\(k)  =  inf 

4>M  =  )“f  {4,-l(*  +  *)  -  G(m,)} 

4,+lW  =  taf  {4,(0  +  C(M,)}  (23) 

with  the  corresponding  definitions  for  l’j(k).  The  solutions  are: 

sf1  =  {MtM, . . M, 

t/o(n)  =  n  +  |«{1)  -  G{N)  +  ^  (24) 

The  case  for  which  mi  <  is  treated  in  a  similar  way. 

The  recursions  (15),  (18),  (21)  and  (23),  together  with  equations  (9)  and  (10), 
allow  us  to  compute  the  sequences  {Si,S2,...}  and  {U\,U2,...}  using  only  one 
dimensional  optimizations.  We  now  turn  to  the  problem  of  determining  the  optimal 
value  n*  for  the  number  of  boundaries. 

2.3.  Stopping  Criterion. 

In  this  section  we  prove  the  following: 

Theorem  2.  Suppose  that  every  (constrained)  optimal  configuration  in  the  sequence 
{5i,52,.. .}  is  unique  (i.e.,  for  every  n,  if  5n  ^  S„,  and  S’n  C  P,  then  U(n,S'n)  > 
t/*)  and  that  for  some  n,  t/*+2  >  U*n.  Then,  U*n+2k  >  U’n,  for  all  fc  >  l. 

This  result  will  provide  us  with  an  efficient  stopping  criterion  for  the  dynamic 
programming  recursions  described  in  the  previous  section;  since  the  first  local 
minima  for  the  subsequences  and  {U*2,U\,...}  are  the  global  ones, 

we  can  terminate  the  computations  once  we  have  found  them. 

To  prove  the  theorem,  we  will  need  the  following  lemmas: 


Lemma  1.  Let  Sk  =  {L\ . Lk)  and  Sk+ 2  —  {Lj ,...Lk+2}  be  the  optimal 

boundaries  (with  corresponding  configurations  fk  and  fk+2)  for  n  =  k  and  n  = 
k  +  2,  respectively.  Suppose  that  k  +  2  <  \P\.  Then,  Sk  C  Sk+2  (i.e.,  Sk+2  is  a 
refinement  of  Sk),  provided  Sk  is  unique. 

Proof: 

We  will  assume  that  for  some  j,  Lj  eSk-  Sk+2,  and  arrive  at  a  contradiction. 
We  consider  three  cases: 

Case  1:  Suppose  that  for  some  *, 

i4  i.;,-, in  s* =« 

In  this  case,  we  claim  that  we  can  find  some  index  p  such  that 

14.  Ctin  si 


fk+2((Lp,Lp+l))  fk((Lp i^p+i]) 

Suppose  that  this  is  not  the  case.  Then,  l!itL'ux  are  the  only  elements  of  Sk+2 
in  some  interval  (Ly,LJ+1)  (or  in  one  of  the  extreme  intervals  [l,Li),(L*,/V])  and 


Suppose 


By  condition  (13),  we  have  that  ^  L\_x  (otherwise,  would  be  a  local  maximum 
and  minimum  of  G  at  the  same  time).  But  then,  since  Sk  is  optimal,  we  can  find 
a  configuration  with  k  +  2  boundaries  whose  energy  is  lower  than  that  of  Sk+2 , 
by  moving  L{  to  Lj  (or  L’i+1  to  Ly+i),  which  contradicts  the  optimality  of  Sk+2.  A 
similar  argument  holds  if 


This  proves  our  claim. 
So,  suppose  that 


Form 


[l;,l;+1]  C  (1,Li)  or  (L*,N] 


l44+iir>St-9 


/*+2((4ViI)^/‘((44iI)- 

Sk  =  {L\,...,Lp_x,Lp+2,...,Lk+2} 


1 


and  let  f'k  be  the  corresponding  configuration,  chosen  in  such  a  way  that  /*(  1)  = 
fk{  1)  (and  therefore,  f'k{[LplLpf ,])  =  fk{[L’p,L'p+ ,])). 

Let  A  U  be  the  change  in  U  (see  eq.  (4))  associated  with  setting: 
f([Lp,L'p+l])  =  fk+2([i:p,L'p+1}). 

We  have  that 

U(Sk+2)  =  U{S'k)  +  AU. 

Now,  we  put: 

Sk+2  =  •  •  m  Lj,  Lp,  Lp+k, . . Lk }. 

Since  Sk  is  optimal,  we  have  that: 

U(Sk+2)  =  U(Sk)  +  A U>  U(Sk)  +  A U  =  U(S'k+2), 
which  contradicts  the  optimality  of  Sk+2. 


Case  2: 

(M;iui4+2^])ns*=0 

Suppose  that  Li  €  [1,  Li).  We  must  have 

A+2([l,Li))  ^  /fc([l,Li)) 

Otherwise,  if  L\  =  L^,  condition  (13)  generates  a  contradiction;  if  Li  >  L^,  we 
are  in  case  1,  and  if  L\  <  L^,  Sk+2  is  not  optimal,  since  we  get  a  lower  energy 
configuration  by  moving  L\  to  L\. 

So, 

/fc+2([ti  -^l])  /k([t»^l])‘ 

By  a  similar  argument,  we  get  that 

/*+2([^'k+2i  W])  ^  /*([^fc+2»  ^f])- 

Now,  proceeding  as  in  case  1,  we  form: 

$k  =  {L2,...i  Lt+ 


and  let  f'k  be  the  corresponding  configuration,  chosen  in  such  a  way  that  /*(1)  = 
/*(1) 

Let  A U  be  the  change  in  0  associated  with  setting: 

/([^lt^])  =  /*+2(l^ii£2])  nnd 
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k— 

!• 


f{[Lk+\iLk+2])  —  /fe+2([^k+n^t+2]) 

so  that 

£(S*+2)  =  U(S'k)  +  && • 

Now,  we  form: 

Sk+ 2  =  {-^ii  -^i»  *  *  •» 

Since  Sk  is  optimal,  we  have  that: 

U(Sk+2)  =  U(S’k)  +  AU>  U(Sk }  +  At/  =  t/(5;+2), 
which  again  contradicts  the  optimality  of  Sk+2 . 

Odsc  3: 

For  all  t,  [4^+1]fl5k^0. 

and  ([l,Li]U^+2.N])n5fc^0  (*) 

To  make  (*)  hold,  we  must  be  able  to  place  k  boundaries  in  it  +  3  (ovelapping) 
closed  intervals,  without  omitting  any  interval.  Moreover,  since  condition  (13)  must 
hold,  we  cannot  put  Lj  =  L\  and  LJ+i  =  Li+2  for  any  i,j.  But  this  is  impossible; 
so,  our  proof  is  finished,  i 

Lemma  2.  Let  A  Uk  =  U{Sk)-U(Sk+2).  Then, &Vk  <  A  Uk.2,  for  allJfc  e  [3, \P\-2\. 
Proof: 

Consider  the  optimal  configurations  Sk,  Sk+2,  Sk+ <,  and  suppose  that  &Uk+2  > 
A Uk.  Using  lemma  1,  let 

Sk  =  {L\,..:,Lky, 

Sk+2  —  {L\, . . .,  Lj,  L2, . .  .,IJ. 

By  condition  (13)  and  lemma  1,  there  are  only  two  valid  forms  for  Sk+i.  We 
consider  each  case  separately: 

Case  1:  Sk+ 4  is  of  the  form: 

Sk+i  —  {Li, . . .,  Lp, Lj,  2*2,  Lp+\. . Lj,  1^, . .  •} 

(i.e.,  the  refinements  corresponding  to  Sk+2  and  Sk+*  are  disjoint). 

Then,  for 

»  .  _  rr  _ »»  _  »*  « 

^4+2  ~  •  •  »i  Lp,  Lp+ 1, . . .)  Li  1 1*2)  •  •  •/) 


we  have 


U(S'k+2)  -  U(sk)  -  A Uk+2  <  U(Sk)  -  A Uk  =  0(Sk+2), 


which  is  a  contradiction. 


Case  2:  S*+4  is  of  the  form: 

‘S'fc+4  =  {•f'l » •  •  •»  1  i  ■f'l  >  ^<2 1  Lit  •  •  •} 

(i.e.,  S*+4  is  a  subrefinement  of  the  refinement  introduced  by  Sk+2). 
Let 

a  =  —  I7({Lj,  . . Lj, Li,LltLj+i, . . .}  +  U(Sk) 
b  —  U({Li, . . Lj,  Lx  ,L2,Lj+i, . . .}  — 
c  —  —U({L\, . . .,  Lj,  1*2,1*21  Lj+i, . . .}  +  U(Sic) 

We  have  that, 

A  Uk  =  a  +  e  —  b 
A  Uk+2  =  b. 

By  assumption, 

b > a+c—b 

and  therefore, 

AUk  =  a  +  c  —  b  <  °  +  e  <  max(a,  c). 

m 

Now,  let  S'k+2  be  formed  from  Sk  by  the  refinement: 

{L\ ,  Lj,  if  o  =  max(a,  c) 

Z<2,  L2,  if  c  =  max(a,  c) 

Then, 

0(S’k+2)  =  0  (S*)  -  max(a,  c)  <  0(Sk)  -  A Uk  =  £(Sk+2), 
which  is  a  contradiction.! 

Now  we  prove  theorem  2: 

Suppose  U*k+ 2  >  U\.  Then, 

k  +  2+~U(Sk+2)>k+^U(Sk) 

now,  by  lemma  2  we  have: 

U*k+4  =  k  +  4  +  |c/(5*+4)  =  k  +  4  +  |(0(Sfc)  -  AUk+2)  > 
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>k  +  2+  |( U(Sk )  -  &Uk+2)  >  k  +  2  +  ^(U{Sk)  -  &Uk)  = 
—  k  +  2  +  —  0(Sk+2)  =  C/k+2  I 


2.4.  Expected  Value  of  n*. 

First,  we  compute  the  (prior)  probability  density  function  p(n)  for  the  number 
n  of  odd  bonds  in  the  original  field  /. 

Let  JVfc  =  N  - 1  be  the  total  number  of  bonds.  We  can  rewrite  equation  (1) 
as: 

P{u)  =  /)  =  Ie£W~2n)  (25) 

z 

The  total  number  of  configurations  compatible  with  a  given  n  is  2 and  so. 


2CjNxp[l(lV»-2n)) 

= cm/  V 

Vei/“  +  e_l/ay  Ve1/®  +  e-1/“/ 


which  is  a  binomial  distribution.  Therefore, 


gfnj  =  /vj-  —  — — ) 
Var(n]  =  ) 

{el/a+e-l/a/ 


We  note  that  as  a  t  oo,  £[n]  t  Nfc/2,  and  as  a  i  0,  £{n]  |  0  (and  var[n]  |  0) 
exponentially  fast.  This  means  that  if  the  natural  temperature  of  the  system  is  not 
too  high,  we  can  expect  that  n\  the  MAP  estimate  for  n,  to  be  relatively  small. 

2.5.  Relation  to  Multiscale  Filtering. 


An  interesting  characteristic  of  the  DP  formulation  is  that  the  solutions  to 
each  of  the  subproblems  (which  in  fact  correspond  to  a  minimization  of  U  (eq. 
(4))  are  independent  of  the  value  of  the  parameter  a.  The  role  of  this  parameter 
is  to  determine  the  number  of  regions  (n*)  that  will  be  present  in  the  optimal 
configuration.  In  this  sense,  it  can  be  regarded  as  a  "scale"  parameter  that  controls 
the  aggregation  of  the  subregions  into  larger  units,  and  the  algorithm  can  be  used  to 
produce  multiscale  descriptions  (in  the  style  of  the  "fingerprints"  treated  in  [W1,Y1]) 


of  the  input  signals  (several  heuristic  solutions  to  this  problem  have  been  proposed; 
see  for  example  [B3,P2,P3]). 

If  we  interpret  the  algorithm  in  this  way,  it  becomes  natural  to  ask  whether 
a  family  of  linear  operators  can  do  the  same  job  in  a  much  cheaper  way.  Let  us 
formulate  this  question  in  more  precise  form  (in  what  follows,  we  will  consider  a 
"continuous  time"  problem  obtained  from  the  original  one  as  a  limit  when  N  |  oo 
(provided  that  the  observations  are  different  from  0  only  in  a  finite  interval),  since 
it  simplifies  the  notation.  It  should  be  clear  that  the  same  arguments  apply  to  the 
discrete  case). 

Consider  a  family  of  filters  {i^}  with  the  following  properties; 

(i)  Each  Fl(x)  is  a  symmetric  and  non-negative  function  of  x. 

(ii)  For  each  L,  Fl(x)  is  a  decreasing  function  of  |x|,  and  Fi{x)  j  0  as  |x|  T  oo 
fast  enough,  so  that  FL  can  be  approximated  by  a  function  with  finite 
support 

(iii)  All  the  filters  are  normalized: 

/oo 

Fi[x)dx  =  l,  for  all  L. 

-OO 

(iv)  The  filters  become  sharper  as  L  i  0: 

f0  fLt(x)dx  <  fQ  FLi{x)dx 
implies  that  bi>  L\ 


Particular  examples  of  acceptable  families  are: 

(i)  The  family  of  rectangular  boxes  BL\ 

Bdx)  =  /i’  I1!  —  L 

'  lo,  otherwise 


(ii)  The  family  of  Gaussian  Kernels: 


Gl{x)  = 


spin  L 


Suppose  we  convolve  the  function  g{x)  -  \  ( g{x )  is  a  continuous  time 
approximation  to  the  observations)  with  a  set  of  filters  from  the  family  {Fl}> 
If  we  start  with  L  large  enough,  the  function 


will  be  practically  constant,  and  therefore,  it  will  have  no  zeroes.  As  we  decrease  L, 
zero  crossings  of  hL  will  begin  to  appear.  To  each  of  these  zero  crossings,  we  will 
associate  a  boundary,  and  form  the  configurations  Si,  S2, . . .  with  1,2,...  boundaries 
respectively,  that  correspond  to  the  first,  first  two,  etc.  zero  crossings  of  hi  (we  are 
ignoring,  at  this  point,  die  question  of  the  precise  localization  of  these  boundaries. 
With  additional  contraints  on  the  family  {Fi}t  it  is  possible,  in  principle,  to  localize 
them  by  decreasing  L  in  a  continuous  fashion,  and  then  tracing  the  position  of  each 
zero  crossing  to  the  finest  (L  =  0)  level;  see  [Yl].  For  the  moment,  let  us  assume 
that  we  can  identify  the  zero  crossings  of  g  -  £  that  correspond  to  those  of  hL,  for 
all  L). 

The  question  that  we  ask  is  the  following: 

If  Si,S2)...  are  the  optimal  boundary  configurations  produced  by  the  DP 
algorithm.is  it  true  that 

S*  =  Sk 

for  all  fc? 

As  we  now  show,  this  is  not  the  case. 

Consider  the  signal  g( x)  defined  by: 

ff(*)  =  l  , 

for  x  6  [li,  h  +  2a]  (J[l2,  /2  +  2b]  (J[l2  +  46, l2  +  66]  (J 
(J[l2  +  86, l2  -f  106]  U(I2  +  126,  i2  +  146]  (J[l2  +  166,  /2  +  186]  , 

and  g(x )  =  0,  otherwise.  Here,  h,  /2,  a  and  6  are  some  positive  numbers  chosen  in 
such  a  way  that,  if  Lq  is  the  starting  L,  we  take  f2  -  lt  -  a  >  >  Lo,  so  that,  by 
property  (ii),  there  is  no  interaction  between  +  o]  and  [l2,i2  +  186]  (see  figure 
1). 

Suppose  that  the  zero  crossings  corresponding  to  +  a]  appear  first  (as  a 
single  double  zero)  at  L  =  Lu  and  those  corresponding  to  [h,  h  +  186]  at  L  —  L^. 
Then, 

Jq  FLi{x)dx  =  f'  FLl{x)dx  (28) 

fQ  FLi{x)dx  +  J3b  F^tfdx  +  j7b  FLi{x)dx  = 

=  l  F^(x)dx  +  J5b  FL>(x)dx  +  J9i  FlMdx  (29) 


Now,  for  a  >  6,  we  have: 


U({lhl2))  =  l0b 

U({h,l<})  =  8b  +  2a>U({h,l2}) 


and  therefore,  S2  =  {l\ ,  h). 


We  claim  that  we  can  find  some  a,  b  with  a  >  b  such  that 

fQ  Fu(x)dx  <  FLj(x)dx 

If  this  is  true,  we  find,  using  (28)  and  conditions  (iii)  and  (iv),  that  it  implies  that 
L2  >  L\t  and  therefore,  S2  = 

We  now  prove  our  claim: 

Let  a  =  b  +  5,  where  we  choose  e  so  that 

Jb  FLt{x)dx  =  J3i  FLt{x)dx  (30) 

(property  (ii)  guarantees  that  we  can  find  such  <).  From  (29), 

fb  Fin{x)dx  =  f0  FLt[x)dx  +  2  J3i  FLi(x)dx  +  2  Jn  Fu{x)dx 

and  from  (30), 

/„”  n,(x)Jx  =  l~/2  FL,(x)dx  =  f"  FiI(x)dx  -  f'11  Fit[x)dx  = 

fb  fk+i/2  fib  fb+t/2  fib 

=  J0  FLt{x)dx+Jb  FLi{x)dx+2  Jn  FL,{x)dx  =  Jq  FL,{x)dx+2  ]n  FL,[x)dx 


>  Jn  FLt{x)dx  =  Jq  FL,{x)dx  | 


This  result  does  not  mean,  of  course,  that  families  of  linear  filters  cannot  be 
used  for  producing  useful  multiscale  descriptions  of  signals;  it  only  means  that  these 
descriptions  cannot,  in  general,  be  considered  as  MAP  estimates  of  MRF  models. 

It  is  possible,  however,  to  design  non-linear  methods  that  are  guaranteed  to 
find  optimal  estimates,  and  that  are  computationally  much  more  efficient  (although 
less  flexible)  than  the  DP  algorithm.  We  will  present  one  such  method  in  section  3. 

2.6.  Extensions. 

In  this  section  we  present  two  related  problems  which  can,  in  principle,  be 
solved  using  the  DP  approach,  although,  as  we  will  see,  in  a  less  efficient  way. 

2.6.1.  Continuous  Valued  Markov  Random  Fields. 

Let  us  consider  the  problem  of  estimating  a  piecewise  constant  signal  corrupted 
by  additive  white  Gaussian  noise.  We  model  the  signal  {/,}  as  a  MRF  with  potential 

V(/,/i+I)  =  £i  2mi£+1  <31> 

and  global  states  distributed  according  to  (1). 

The  observations  are  given  by: 

9i  =  fi  +  n,- 

where  n  is  a  white  Gaussian  process.  The  Bayesian  (MAP)  estimate  for  /  is  again 
found  by  minimizing  eq.(4): 

t'(/)  =  *  +  ft> 
u  =  Ufi  -  ff.)1 

«'+i 

where  n  is  the  number  of  places  where  /,•  ^  /,+ 1,  and  a  =  jf?.  Note  that  in  this 
case,  fi  is  not  restricted  to  {0, 1},  but  can  take  any  real  value. 

Proceeding  as  we  did  in  section  2,  we  consider  the  sequence  of  subproblems 
obtained  by  putting  n  =  0, 1, 2, . . .. 

For  any  fixed  n,  U  will  depend  only  on  the  n  integer  variables  that  correspond 
to  the  location  of  the  boundaries  between  regions  of  constant  /,  since  given  these 
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boundaries  L  =  {Lu...Ln},  the  optimal  estimate  for  /  on  any  interval  (Li,L,+i] 
(we  put  Lq  —  1  and  L„+i  =  N)  is: 

—  j  7"  9j- 

L'i+l  ~  M  j=Li  + 1 

If  we  define  Gk,i  (for  k  <  l)  as: 

cw  =  (»-2('-*))(rr*  £*)  (32) 


We  get  that: 


U(Ln) 


N  n+1 


Z)  +  Z)  GL,-uLj 

»= 1  J=1 


(33) 


(note  that  £$?  is  a  constant  for  a  given  set  of  observations).  Using  dynamic 
programming  principles,  we  can  now  write  the  recursions: 


F0(k)  =  Gk,N 

Fj+i{k)  =  mf  {Gkti  +  Fj(i)} 

LJ+l(k)  =  (L  :  C?*,L  4-  Fy(L)  =  F;+1(fc)}  (34) 

The  optimal  solution,  for  each  given  n  is: 

Sn  =  {^«(1),  Ln. !(Ln(l)) . Li(La(.  .  .(L„(l)). . .)} 

and  the  corresponding  energy, 

Cf(n,Sn)  =  n  +  |[f;g?  +  Fn(l)]  (35) 

The  solution  to  our  problem  will  be  S„*,  where: 

^(n*,5n.)  =  inf{C/(n,5„)}  (36) 

tt 

Unfortunately,  in  this  case  we  cannot  guarantee  the  unimodality  of  any  subsequence 
of  {t/(S„)}  (although  we  believe  that  the  sequence  will  be  unimodal  in  many  cases) 
and  so,  (36)  has  to  be  computed,  in  principle,  by  an  exhaustive  one  dimensional 
search.  Another  unpleasantness  is  that,  unlike  the  binary  case,  the  search  space  for 
the  variables  L,  cannot  be  reduced  in  any  obvious  way. 

2.6.2.  One  Dimensional  Signal  Matching. 


19 


Another  problem  for  which  this  formulation  may  be  useful  is  the  following: 
Let  /  be  a  one  dimensional  MRF  with  potential  given  by  (31),  that  can  take 


values  on  the  set 


Q  =  {— m,  — m  +  1, ...,— 1,0, 1, 


for  some  positive  integer  m.  Suppose  we  observe  two  binary  sequences  gR,  gi  which 
are  formed  from  /  according  with  the  following  stochastic  model: 

9r(*)  =  ap{') 

gR(i  +  /.).  with  prob.  1  -  e,  if  <f>/(i)  =  0 
gL(i)  =  Bp(i),  with  prob.  c,  if  ^/(t)  =  0  (37) 

Bp(i),  with  prob.  1,  if  —  1 

Here,  Ap  and  Bp  are  independent  Bernoulli  processes  (also  independent  from  /) 
with  density  p  (so,  Ap(i),  Bp(i)  6  {0,1});  e  €  (0,1)  is  the  error  rate,  and  <f>t  is  an 
"occlusion  indicator"  whose  value  depends  deterministically  on  /  in  the  following 
way: 

,  ,  ,  [1,  if  U-k  >  fi  +  for  some  integer  k  G  [0,  m]  ,-gv 

to,  otherwise  1 


A  well  known  instance  of  this  problem  is  the  matching  of  a  row  of  a  random  dot 
stereogram  with  density  p  [Jl],  when  the  components  of  the  stereo  pair  are  corrupted 
by  noise.  In  this  case,  the  use  of  a  MRF  model  for  the  disparity  /  corresponds 
to  a  quantification  of  the  assumption  of  the  existence  of  "dense  solutions"  ([Jl]; 
see  also  [M3]),  and  the  use  of  the  occlusion  indicator  corresponds  to  the  "ordering 
constraint"  [B2J. 

To  formulate  the  estimation  problem,  we  will  consider  the  sequence  gL  as 
"observations",  while  gR  will  play  the  role  of  a  set  of  parameters.  Thus,  we  have 
(assuming,  for  simplicity  that  p  =  \): 

P{9L,{i)  =  k\f,gR)  =  P 9\j{k)  = 

f  1  -  c,  if  4>f[i)  =  0  and  g*(i  +  /,)  =  k 

=  { c,  if  <t>f  [i)  =  0  and  gjt(i  +  /,)  ^  k 

U,  if<M0  =  i 


Putting: 


'Pr(t)  =  -  ln(l  -  e)S(gL(i)  -  gR(i  +  r))  - 
-  In  e(l  -  S[gL{i)  -  gR{i  +  r)) 


where 


if  x  —  0 

otherwise 


I 


we  get  that  the  MAP  estimate  for  /  is  obtained  by  minimizing 

=  n  +  J  El1"  + 11  -  (39) 

The  use  of  the  DP  algorithm  for  minimizing  (39)  is  complicated  by  the  fact  that, 
given  the  boundaries  Ln,  the  optimal  estimate  for  /  in  the  interval  (L,-,  Lt+1]  depends 
on  the  estimate  on  (£,_!,£,],  since  this  last  choice  determines  the  extent  of  the 
occluded  region. 

However,  if  we  assume  that  the  size  of  the  regions  of  constant  disparity  is 
relatively  large  compared  with  the  size  of  the  occluded  areas  (as  it  normally  happens 
in  most  practical  cases),  we  can  estimate  /  given  Ln  using  the  formula: 

/((L,,L,+i])  =  /(L„A+1)  = 

=  ik:  D  %z(*)  -  +  *))  >  J2  %l(»)  ~  9R{i  + /)),  for  all  /  €  Q}  (40) 

i=L%  +  l  i=Li+l 

Defining: 

GM  =  T,  */{*,/)(*)  (41) 

we  can  write  the  dynamic  programming  recursions: 

Fo(fc)  =  Gk,N 

Lo(k)  =  N 

Fj+i[k)  =  mf  {G^,-  4-  Fj(k  +  A)  +  A  ln  2} 

Lj+i  =  {L:Gk,L  +  Fj[L  +  A)  +  A  ln  2  =  FJ+i(A;)} 

with  A  =  min(0,  f{k,  i)  -  /(*',  Ly(t'))  (42) 

The  optimal  location  of  the  boundaries,  for  any  given  n  is: 

Ffi  —  {Ln(l)i  ^n— 1  (^n(l))i  •  •  *1  Li {L%{. .  .(Ln(l)).  .  .)} 

The  optimal  configuration  is  computed  using  (40),  and  the  corresponding  energy, 
using  (38)  and  (39). 


3.  An  Alternative  Algorithm. 


In  this  section  we  present  an  algorithm  that  finds  the  MAP  estimate  for  a 
binary  MRF  in  time  O(N),  and  using  storage  which  is  also  0[N),  and  is,  therefore, 
practically  as  efficient  as  it  can  be. 

We  consider  again  a  first  order  MRF  F  on  a  one  dimensional  lattice  of  length 
N ,  but  we  now  assume  that  F,  6  {— 1, 1}  (there  is  no  loss  of  generality  in  this 
assumption,  since  any  binary  process  can  be  brought  into  this  form  by  a  memoryless 
linear  transformation). 

Using  the  model  for  the  observations  described  in  section  1,  and  reasoning  as 
we  did  in  section  2,  we  find  that  the  MAP  estimation  problem  is  equivalent  to  the 
minimization  of 

tf(/)  ««  +  £*/<(*•).  /<€{-l,l>  (43) 

« 

where  n  is,  as  before,  the  total  number  of  odd  bonds  in  the  configuration  /,  and 


We  now  present  a  method  for  performing  this  minimization,  and  a  proof  of  its 
optimal  performance. 

Description  of  the  Algorithm. 

The  idea  in  which  this  method  is  based  is  the  following: 

We  start  scanning  the  sequence  {g,},  say,  from  the  left,  with  some  initial 
estimate  k  for  fx,  and  set  Iq  —  1.  Whenever  we  process  a  new  observation  gy,  we 
ask  if  we  can  lower  the  energy  by  putting  a  boundary  in  the  best  possible  location 
l  within  the  interval  [10,;].  If  this  is  the  case,  we  put  the  boundary  at  l,  that  is: 

set  fi  =  k,  for  t  €  [lo,l] 

set  fc  =  —k, 
and  set  l0  =  l  +  i. 

Otherwise,  we  just  set  /y  =  /y_i,  and  continue  to  process  the  next  observation. 

When  we  reach  gN,  we  take  /n  as  the  initial  estimate  and  run  the  same  process 
backwards  (in  fact,  we  can  make  this  backward  run  as  soon  as  we  get  the  second 
boundary)  to  get  the  final  solution. 

Formally,  the  algorithm  is  as  follows: 
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Definition  of  Variables. 


i:  Current  position. 

l0 :  Pointer  to  the  beginning  of  the  current  region. 

1:  Current  optima!  location  of  the  boundary  in  the  interval  [Jo.*]. 
k\  Current  estimate  for  /((fo,/)). 

Up\  Energy  increment  associated  with  the  assignment  /([/0,  *])  =  *• 

Um:  Energy  increment  associated  with  the  assignment  /([to,*])  =  -k. 

Ub:  Energy  increment  associated  with  the  assignment  /([/0,  l})  =  k-,f((l,i ]) 
si:  Best  local  (maximum  likelihood)  estimate  for  /,-. 
st'ml:  Best  local  (maximum  likelihood)  estimate  for  j. 

Uvi‘.  Energy  increment  associated  with  the  assignment  /([loif])  =  k. 

Umi\  Energy  increment  associated  with  the  assignment  /([lo.l])  =  —k. 
Utemp'  Temporary  storage  register. 

M:  A  very  large  positive  number. 

Kq:  Switch  indicating  the  method  for  estimating  f\. 
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Algorithm  A1(Kq): 


1:  Initialization. 

Set  l0  =  l  =  V,Up  =  Um  =  Umi  =  0;Ub  =  l;Upl  =  M. 

Set  k  =  1,  if  K0  =  0  and  9  +1(31)  <  9  -1(31) 

-l,  if  K0  =  0  and  9  +i(si)  >  9  -1(31)  ; 

Kq,  if  AT0  7^  0. 

Set  atml  =  k 

2:  Main  Loop:  For  i  from  1  to  N  do: 

Begin 

Set  at  =1,  if  9  +1(ft)  <  *  ; 

-1,  otherwise. 

2.1:  See  if  the  optimal  boundary  location  needs  to  be  updated: 
If  (at  7^  k  and  at  7^  atml  and  Up  —  Upl  —  Um  +  Umi  <  0)  do 
Update  boundary  location: 

Set : 


l  =  i-  1 
Upl  =  Up 
Uml  =  Um 
Ub  =  Up  +  1 

2.2:  Update  energy  increments: 
Set : 


Up  —  Up  +  9  +1  (?») 
Um  —  Um  +  ^  -l(St') 
ub  =  Uh  +  9  +i(?i) 


2.3:  See  if  a  new  boundary  has  to  be  introduced: 

If  (C/4  +  1  <  Up)  do  : 

Introduce  a  new  boundary: 

For  j  from  l0  to  l  do  :  Set  /,-  =  k 

Set : 

k  =  -k 
Iq  —  l  +  l 
Ufemp  —  Up  Up, 

Up  =  Um-  Una 
Um  —  Uttmp 

Up,  —  M 

uh  =  um  + 1 

2.4:  Set  stml  =  si 
End 

3:  See  if  the  last  boundary  has  to  be  introduced: 

If  (1/4  <  Up)  do  : 

3.1:  For  j  from  l0  to  l  set  /,■  =  k. 

3.2:  Set  i0  =  /  + 1. 

3.3:  Set  k  =  —k. 

4:  Fill  the  last  region: 

For  j  from  Iq  to  N  set  /,•  =  k. 

End. 

3.2.  Optimality  of  Algorithm  Al. 

The  optimality  of  this  algorithm  follows  from  the  following  propositions: 

Proposition  1:  Let  S'  —  {1, be  the  optimal  boundary  configuration,  and 
suppose  that  /*,  for  k  <  n  was  detected  by  Al.  Then,  /*+j  will  be  the  next  boundary 


detected  by  Al. 


Proof: 

Suppose  lk  was  detected  by  Al,  and  let  L  be  the  next  boundary  detected.  We  will 
assume  that  L  ^  lk+i  and  arrive  at  a  contradiction.  We  will  consider  three  cases: 

Case  1:  Suppose  Al  detects  L  at  j  <  lk-\. 

Then,  we  must  have  that 


UP(j)  >  UP(L)  +  Um(j)  -  Um[L)  +  2 

and  therefore, 

which  is  a  contradiction. 

Case  2:  Suppose  Al  detects  L  at  j  e  (**+1,^+2]- 

This  means  that  at  j  we  had  that  L  was  the  optimal  location  for  the  boundary.  In 
particular. 


UP(lk+ 1)  +  Um(j)  -  Um(ljc+\)  >  UP(L)  +  Um{j)  -  Um(L) 
which  implies  that 

UP(L)  +  Um(lk+2)  -  Um[L)  <  Up(lk+1)  +  Um[lk+2)  -  Um(lk+i) 
and  therefore, 

U({h . lk,L,lk+2t. <  U(S*) 

which  is  a  contradiction. 

Case  3:  Suppose  that  Al  has  not  detected  any  new  boundary  at  j  =  lk+2  + 1. 
Then,  we  must  have: 

Up{lk+ 2  +  1)  <  U„(lk+2  +  1)  +  1 

which  means  that 

which  is  again  a  contradiction.  1 

Proposition  2:  If  Al  runs  from  left  to  right  starting  at  a  point  fo.  and  generates 
the  boundaries  then,  lj  €  5*  (the  set  of  boundaries  of  the  optimal 

configuration)  for  j  >  2. 


Proof: 

Let  f* ,/m  be  the  optimal  configuration,  and  the  one  generated  by  Al,  respectively. 


Let 

Lo  =  8up{>  6  5*  :  j  <  li} 

L  =  inf{;  €  S*  :  j  >  ii} 

If  Lo  =  la,  we  apply  proposition  1  and  finish  the  proof;  so,  let  us  assume  that 
Lo  7^  lo,  and  that  l\  was  detected  at  t.  We  have  two  cases: 

Case  1:  Lo  >  lo.  We  claim  that  in  this  case,  h  €  5*.  and  therefore,  by  proposition 
1,  lj  €  S*  for  j  >  1.  To  prove  this  claim,  we  consider  two  subcases: 

Case  1-a:  /*((i0,Lo))  ^  /xi((io,Lo)). 

In  this  case,  we  have: 

2  +  CWO  -  Um{h)  +  <  t/P(0 

and  therefore, 

2  +  Um( i)  -  Um(h)  +  Up(h)  -  Up[Lq)  <  Up(i)  -  Up(Lo ) 

which  implies  that  l\  €  5*. 

Case  1-b:  /*((i0,L«))  =  /^i((/o,Lo)). 

Suppose  li  £  S* .  We  have  that,  at  location  i, 

Up(h)  +  Um(i)  -  Um{h)  +  2  <  Up(Lo)  +  Um(i)  -  Um{Lo)  +  2 

since  otherwise,  Lo  would  have  been  a  better  location  for  the  boundary.  However, 
this  implies  that 

£/„(!,)  +  Um(L)  -  Um(h)  <  Up{Lo)  +  Um(L)  -  Um(h) 

which  means  that  we  can  improve  5*  by  moving  Lo  to  li,  which  is  a  contradiction. 
Case  2:  Lo  <  io. 

Again,  we  consider  two  subcases: 

Case  2-a:  /‘((Lo,/0))  =  /yii((Lo,/o)). 

Let  U+,  U-  be  the  energy  increments  with  respect  to  Lo: 

=  i:  ♦+*(»,) 

U-{i)  =  £ 
i=U 


Note  that 


Up(i)  =  U+{i)  -  U+[k)  and 


UM(i)  =  U-(i)-U-(l0) 
Since  1 1  was  detected  at  t,  we  have: 


2  +  Um[x)  -  Um(h)  +  Up(h)  <  U„(i) 

and  therefore, 

2  +  U-(i)  -  U-(h)  +  U+{h)  <  U+(i) 
which  means  that  /i  e  5*. 

Case  2-b:  /*((!*,  Z0))  ^  />ti((A), /©))• 

Using  the  same  definitions  for  {/+,£/_,  we  have  that,  by  the  optimality  of  S*.  for 
some  j  >  L, 

U-(j)  -  U-(L)  +  U+(L)  +  2  <  U+(j) 

and  therefore, 


U.(j)  -  U-(L)  +  U+(L)  -  U+(l i)  +  2  <  t/+(L)  -  U4h) 

which  means  that  if  A1  detects  llt  it  must  detect  L  too,  unless  it  detected  h  first, 
but  in  this  case  we  have  that,  for  some  p  <  j, 

U-(p)  -  U-(l2 )  +  U+(l2)  -  U+(l i)  +  2  <  £/+(P)  -  lM*i) 

which  implies  that  he  S*.  This  completes  the  proof.  | 

It  should  be  clear  that  these  results  can  be  easily  extended  to  the  case  where 
A1  runs  backwards  (from  right  to  left).  With  this  extension,  we  get  the  following 
complete  optimal  procedure: 

Algorithm  A2: 

1:  Run  Al  from  left  to  right.  Detect  {h, . . /„}. 

2:  Run  Al  backwards  (starting  from  h •  Get  either 

or  {Ii  •>!**} 

In  either  case,  this  is  the  optimal  solution. 

The  only  thing  that  remains  to  be  proved  is  that  the  determination  of  the 
optimal  location  for  a  boundary  is  in  fact  performed  by  step  2.1  of  Al.  We  have 
the  following: 

Proposition  3:  Suppose  that  Al  detected  a  boundary  at  (or  started  from)  l0.  Then, 
the  optimal  location  l  of  the  next  boundary  has  to  be  updated  only  at  places  where 
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at  =  -k  and  stml  =  k.  Suppose  *  is  one  such  place.  The  optimal  location  will  be: 


ft  - 1,  if  Up{i  -  l)  -  Um(i  -  1)  <  Upi  -  Um, 
U,  (the  current  value)  otherwise 


Proof: 

First,  we  note  that  a  necessary  and  sufficient  condition  for  l  to  be  the  optimal 
location  of  the  boundary  at  the  point  t  is  that,  for  j  e  [to.  *  —  1] : 

UP{1)  +  Um(i)  -  Um(l)  <  Up(j)  +  Um(i)  -  Um(j) 

or  equivalently, 

UP(l)  -  Um(l)  <  Up(j)  -  Um(j) 

Suppose  l  was  the  optimal  location  at  t  —  1,  and  we  process  observation  t.  We 
consider  several  cases: 

Case  1:  siml  =  —k 

In  this  case,  we  show  that  l  remains  the  optimal  location: 

By  construction,  we  have  that: 

V,(i  -l)  =  Up{i-  2)  +  ¥+*(*_,) 

Um[}  —  1)  =  Um[i  —  2)  +  ty_*(0i_i) 

Since  siml  =  —k  we  have  that. 


1)  -  >  0 


and  therefore, 


UP(i  —  1)  —  Um[i  —  1)  —  Up(i  —  2)  —  U,n{i  —  2)  4-  ♦+t(g,_i)  —  W — )  > 


>  Up(i  -  2)  -  Um(i  -  2)  >  UP{1)  -  Um(l) 

so  that  l  remains  the  optimal  location. 

Case  2:  siml  —  k 
In  this  case  we  have  that 

Up{i  -  1)  -  Um(i  -  1)  <  Up(i  -  2)  -  Um(i  -  2) 

This  means  that  the  minimal  value  for  Up{i )  -  l/m(i)  on  a  block  for  which  si  =  k 
will  be  obtained  at  the  extremal  point  where  at  =  -k  and  stml  =  k,  and  since,  by 
theorem  1,  this  is  the  only  point  where  a  boundary  might  be  placed,  it  is  sufficient 
to  update  the  optimal  location  only  at  these  points.  So,  suppose  stml  =  k  and 
at  =  — Ac. 
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If 


Upl-Uml<Up(i-l)-Um(i-l), 


then, 

Up,  -  Uml  <  Up(j)  -  Um(j)  for  j  6  [k,  i  - 1] 

because  l  was  the  optimal  location  outside  the  last  block  where  si  =  k.  By  the  same 
token,  it  is  clear  that  if 


Upi  -  Um,  >  Up[i  -  1)  -  Um(i  -  1), 


the  new  optimal  location  will  be  *  -  l.i 


3.3.  Experimental  Evaluation  of  Simulated  Annealing. 


Once  we  have  an  efficient  method  for  getting  the  optimal  estimate  for  the 
process  /,  it  is  interesting  to  use  it  to  evaluate  the  performance  of  other  algorithms, 
such  as  simulated  annealing  (see  [Kl]  for  a  description  of  the  method,  and  [Gl.Ml] 
for  examples  of  its  application  to  MRF  estimation  problems). 

In  order  to  make  this  comparison,  we  implemented  both  algorithms  in  a 
computer,  and  performed  some  numerical  experiments.  The  field  /  was  generated 
using  Metropolis  algorithm  [M2.G1.G2],  and  the  observations  g  by  adding  to  /  an 
independent  white  Gaussian  random  process  of  given  power  spectral  density  a2. 
For  the  annealing  schedule,  we  used  the  formula  (see  [Gl]): 


k-0  ln2 
ln(j  + 1) 


(44) 


where  0  is  the  natural  temperature  of  the  field  /;  T  is  the  annealing  temperature, 
and  j  is  the  iteration  number.  By  a  trial  and  error  procedure  we  found  that  the 
optimal  value  for  the  constant  k  was  0.5. 

Figure  2  shows  the  results  of  a  typical  experiment  for  a  lattice  of  50  points 
(we  used  0  —  2  and  a  =  0.8  for  the  parameters  of  the  processes).  The  top  row 
shows  the  original  field  /  (  black  squares  mean  /,■  =  1,  and  white  ones,  /,  =  0); 
the  second  row,  the  maximum  likelihood  estimate  obtained  by 


if  w  -  i  >  0 

otherwise 


The  third  row  is  the  optimal  estimate  obtained  by  A2,  and  the  last  row,  the  result 
obtained  by  simulated  annealing  after  50  global  iterations.  The  corresponding  values 
for  the  energy  (equation  (2))  were: 

(/(/)  =  52.083 

U{Jml)  =  43.38 
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Figure  1  (see  text). 


U(fM)  =  29.34 


U(/sa)  =  29.5 

It  is  interesting  to  note  that,  for  this  problem,  the  convergence  of  the  simulated 
annealing  algorithm  is  extremely  fast.  We  can  get  a  near  optimal  (and  in  many 
cases  optimal)  solution  in  less  than  10  iterations,  with  the  appropriate  setting  of  the 
constant  k  in  equation  (44). 


4.  Discussion. 


We  have  presented  two  deterministic  algorithms  for  finding  the  optimal  (MAP) 
estimate  of  a  binary,  one  dimensional  MRF  from  noisy  observations. 

A2,  the  algorithm  presented  in  section  3,  is  without  doubt  the  most  efficient, 
and  its  complexity  ( 0(N ))  is  certainly  optimal  for  this  problem.  However,  it  is  very 
difficult  to  extend  it,  so  that  it  can  be  applied  to  other  related  problems  (such  as 
those  presented  in  section  2.6)  which,  in  principle,  can  be  solved  with  the  dynamic 
programming  approach  that  we  presented  in  section  2.  In  these  cases,  however, 
the  absence  of  results  of  the  type  of  theorems  1  and  2  (which  provided  us  with  a 
substantial  reduction  of  the  search  space,  and  with  an  efficient  stopping  criterion  in 
the  binary  case)  make  the  application  of  the  DP  algorithm  computationally  more 
expensive,  although  still  perfectly  feasible.  It  would  be  interesting  to  implement 
these  algorithms  and  use  them  (as  we  did  in  section  3.3  for  the  binary  case)  as 
benchmarks  for  the  evaluation  of  algorithms  whose  optimal  performance  is  still 
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uncertain,  such  as  the  proposed  extensions  to  simulated  annealing  for  handling 
continuous-valued  variables  (G2|. 

An  important  open  question  is  whether  efficient  deterministic  algorithms  can 
be  designed  for  the  estimation  of  two  dimensional  MRF’s.  A  direct  extension  of 
the  techniques  we  have  presented  here  is  not  posible;  the  main  difficulty  in  the  two 
dimensional  case  is  that  the  geometry  of  the  boundaries  between  uniform  regions 
(which  in  the  one  dimensional  case  are  simply  points),  causes  a  combinatorial 
explosion  of  the  number  of  possible  configurations  compatible  with  a  given  total 
boundary  length.  Our  results,  however,  show  that  in  principle  it  is  possible  to  exploit 
the  structure  of  the  energy  function  of  a  particular  class  of  estimation  problems  to 
design  efficient  algorithms  for  its  global  minimization. 
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Notes: 

[1]  In  the  language  of  statistical  mechanics,  the  "state'*  of  a  system  at  a  given 
temperature  is  a  probability  measure  (the  Gibbs  measure)  defined  on  the  phase 
space  of  the  system  (in  our  case,  {-1, 1}^).  Since  at  zero  temperature  the  Gibbs 
measure  becomes  a  delta  function  at  the  global  minimum  of  the  corresponding 
energy  function  (assuming  it  is  unique),  the  global  minimizer  /*  completely  specifies 
the  ground  state. 

(2]  Since  we  are  using  a  white  noise  model,  the  maximum  likelihood  estimate  for  /  is 
obtained  by  the  independent  maximization  of  each  term  of  the  likelihood  function: 

i«tz,P(e  !/)]  =  -£ fj  €  {ko, fci>. 
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